Don’t forget to document your functions! ({roxygen}-style)
#' Read and clean data#' #' Reads in the penguins data, renames and selects relevant columns. The#' following transformations are applied to the data: #' * only keep species common name#' * extract observation year#' * remove rows with missing values#' #' @param file Character, path to the penguins data .csv file.#' @returns A tibble.read_data <-function(file) { readr::read_csv(file, show_col_types =FALSE) |> janitor::clean_names() |> dplyr::mutate(## modifying columns ) |> dplyr::select(## all relevant columns ) |> tidyr::drop_na()}
Step 1: Turn your code into functions
Improved script:
R/helper_functions.R
#' Read and clean data#' #' ...read_data <-function(file) { ... }#' Violin plot of variable per species and sex#' #' ...violin_plot <-function(df, yvar) { ... }#' Scatter plot of bill length vs depth#' #' ...plot_bill_length_depth <-function(df) { ... }
here() starts at C:/Users/hrpoab/Desktop/GitHub/palmerpenguins_analysis
> dispatched target penguins_raw_file
o completed target penguins_raw_file [0 seconds]
> dispatched target penguins_df
o completed target penguins_df [0.85 seconds]
> dispatched target body_mass_plot
o completed target body_mass_plot [0.16 seconds]
> dispatched target bill_scatterplot
o completed target bill_scatterplot [0.02 seconds]
> dispatched target flipper_length_plot
o completed target flipper_length_plot [0.02 seconds]
> ended pipeline [1.31 seconds]
Get the pipeline results
targets::tar_read(bill_scatterplot)
Change in a step
Hi Olivia,
Great work! Just a minor comment, could you change the colours in the bill length/depth scatter-plot? It’s hard to see the difference between the species.
Great work! Just a minor comment, could you change the colours in the bill length/depth scatter-plot? It’s hard to see the difference between the species.
here() starts at C:/Users/hrpoab/Desktop/GitHub/palmerpenguins_analysis
v skipped target penguins_raw_file
v skipped target penguins_df
v skipped target body_mass_plot
> dispatched target bill_scatterplot
o completed target bill_scatterplot [0.35 seconds]
v skipped target flipper_length_plot
> ended pipeline [0.53 seconds]
Change in a step
targets::tar_read(bill_scatterplot)
Change in the data
Hi Olivia,
Oopsie! We realised there was a mistake in the original data file. Here is the updated spreadsheet, could you re-run the analysis with this version?
targets::tar_visnetwork()
Writing a report with Quarto
reports/palmerpenguins_report.qmd
---title: | Analysis of penguins measurements from the palmerpenguins datasetauthor: "Olivia Angelin-Bonnet"date: todayformat: docx: number-sections: true---```{r setup}#| include: falselibrary(knitr)opts_chunk$set(echo = FALSE)```This project aims at understanding the differences betweenthe size of three species of penguins (Adelie, Chinstrapand Gentoo) observed in the Palmer Archipelago, Antarctica,using data collected by Dr Kristen Gorman between 2007 and2009.## Distribution of body mass and flipper length@fig-body-mass shows the distribution of body mass (ingrams) across the three penguins species. We can see that on average, the Gentoo penguins are the heaviest,with Adelie and Chinstrap penguins more similar in termsof body mass. Within a species, the females are on average lighter than the males.```{r fig-body-mass}#| fig-cap: "Distribution of penguin body mass ..."# code for plot```Similarly, Gentoo penguins have the longest flippers onaverage (@fig-flipper-length), and Adelie penguins theshortest. Again, females from a species have shorterflippers on average than the males.```{r fig-flipper-length}#| fig-cap: "Distribution of penguin flipper length ..."# code for plot```
Quarto + {targets}
Two advantages of using a Quarto document alongside {targets}:
can read in results from targets pipeline inside the report \(\rightarrow\) no computation done during report generation
can add the rendering of the report as a step in the pipeline \(\rightarrow\) ensures that the report is always up-to-date
Quarto + {targets}
Two steps to use the {targets} pipeline results in a Quarto document:
Presentation for Rladies Sydney, online, 7 April 2025
Publication data: Angelin-Bonnet O. April 2025. Reproducible analysis pipelines with {targets}. A Plant & Food Research presentation. SPTS No. 26935.
Presentation prepared by: Olivia Angelin-Bonnet Statistical Scientist, Data Science April 2025
Presentation approved by: Mark Wohlers Science Group Leader, Data Science April 2025
For more information contact: Olivia Angelin-Bonnet DDI: +64 6 355 6156 Email: olivia.angelin-bonnet@plantandfood.co.nz
This report has been prepared by The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research). Head Office: 120 Mt Albert Road, Sandringham, Auckland 1025, New Zealand, Tel: +64 9 925 7000, Fax: +64 9 925 7001. www.plantandfood.co.nz
DISCLAIMER
The New Zealand Institute for Plant and Food Research Limited does not give any prediction, warranty or assurance in relation to the accuracy of or fitness for any particular use or application of, any information or scientific or other result contained in this presentation. Neither The New Zealand Institute for Plant and Food Research Limited nor any of its employees, students, contractors, subcontractors or agents shall be liable for any cost (including legal costs), claim, liability, loss, damage, injury or the like, which may be suffered or incurred as a direct or indirect result of the reliance by any person on any information contained in this presentation.